3/8/2017

Getting the slides and data

What are htmlWidgets?

The htmlWidgets family of packages wrap Javascript visualization libraries. In practice, that means that you can create interactive visualizations from R in which code run by the viewer's web browser allows the interaction.

These packages can be used to create interactive visualizations that can be added to several outputs:

  • RStudio's Viewer
  • RMarkdown documents rendered to HTML (including presentations)
  • Books or blogs developed using RMarkdown tools (bookdown and blogdown, respectively)
  • Package vignettes
  • Shiny web applications

Example data

For example data, I've pulled listings from NOAA's Storm Events Database. I've pulled all listings of flood and tornado events in Colorado between 2013 and 2015. (Note: I used the noaastormevents package, with development lead by Ziyu Chen, to do this directly from R.)

Example data

I've created two data sets, co_floods and co_tornados. Each row represents an event. Columns include the event's date, location, damage, and a few other details.

load("data/co_floods.Rdata")
colnames(co_floods)
##  [1] "begin_date"        "end_date"          "event_type"       
##  [4] "fips"              "cz_name"           "deaths_direct"    
##  [7] "injuries_direct"   "damage_property"   "damage_crops"     
## [10] "source"            "begin_lat"         "begin_lon"        
## [13] "end_lat"           "end_lon"           "flood_cause"      
## [16] "episode_narrative" "event_narrative"
co_floods %>% select(begin_date, event_type, cz_name, fips, begin_lat, begin_lon) %>% slice(1:3)
## # A tibble: 3 × 6
##   begin_date  event_type cz_name  fips begin_lat begin_lon
##       <date>       <chr>   <chr> <dbl>     <dbl>     <dbl>
## 1 2013-08-10 Flash Flood Fremont  8043   38.4858 -105.3768
## 2 2013-09-13       Flood  Pueblo  8101   38.4439 -104.5983
## 3 2013-09-13       Flood  Pueblo  8101   38.2708 -104.4557

Example data

One of the variables gives longer text descriptions of the event (and can be missing):

co_floods %>% select(begin_date, event_narrative) %>% sample_n(4) %>% 
  pander(split.cells = c(5, 50))
begin_date event_narrative
2013-09-11 Flash flooding was observed in the High Park burn area. It forced the closure of CO14, between Rustic and Teds Place. Other road closures included Larimer County Road 70 between County Roads 20 and 21.
2013-09-14 The combination of heavy rain, coupled with extremely saturated ground conditions, produced additional flash flooding.
2013-09-12 Six to eight inches of water was reported flowing over Highway 385 at County Road C. A semi truck almost lost control driving through the flooded section of road.
2014-07-19 Heavy rain on the Waldo Canyon burn scar caused flooding on US Highway 24, along with mud flows. The flooding extended down to Waldo Canyon and into Manitou Springs.

Mapping Colorado floods

It's pretty straightforward to use R functions to create a static map with this data:

library(ggplot2)
map_data("county", region = "colorado") %>% 
  ggplot(aes(x = long, y = lat, group = group)) + 
  geom_polygon(color = "darkgray", fill = "white") + 
  geom_point(data = co_floods, aes(x = begin_lon, y = begin_lat, group = NULL),
             color = "blue", alpha = 0.5) +
  coord_map() + theme_void()

Mapping Colorado floods and tornadoes

However, we might prefer sometimes to create something interactive:

Creating a basic leaflet map

The three basic components of a leaflet map are:

  • Leaflet widget
  • Background (tiles)
  • Mapping of data to locations
co_floods %>% leaflet() %>% 
  addProviderTiles("OpenStreetMap.Mapnik") %>% 
  addCircleMarkers(lng = ~ begin_lon, lat = ~begin_lat, radius = 3, color = "blue")

Choices for map tiles

With addProviderTiles, you can pick from many different background map tiles. Visit this provider preview page to see options and get the provider names to use in the R call. Here is the same map using different map tiles:

co_floods %>% leaflet() %>% 
  addProviderTiles("NASAGIBS.ViirsEarthAtNight2012") %>% 
  addProviderTiles("CartoDB.DarkMatterOnlyLabels") %>%
  addCircleMarkers(lng = ~ begin_lon, lat = ~ begin_lat, radius = 3, color = "green")

Choices for mapping data to locations

There are various things you can add to the map to map data to locations. Several are listed here, and all can be add using add plus the item name (for example, addMarkers).

  • Markers: "Push pin" style markers
  • CircleMarkers: The circle markers shown in examples so far
  • LabelOnlyMarkers
  • Polylines
  • Circles
  • Rectangles
  • Polygons

You can layer several of these on the same leaflet map (e.g., roads with Polylines, counties with Polygons, exact locations with one of the markers).

Choices for mapping data to locations

Data can be mapped from a dataframe with columns for longitude and latitude. In this case, the names of the correct columns for longitude and latitude should be specified with the lng and lat arguments:

leaflet() %>% 
  addProviderTiles("OpenStreetMap.Mapnik") %>% 
  addCircleMarkers(data = co_floods, lng = ~ begin_lon, ~ begin_lat, 
                   radius = 3, color = "blue") %>% 
  addMarkers(data = co_tornadoes, lng = ~ begin_lon, ~ begin_lat)

Choices for mapping data to locations

Alternatively, if you have data saved as a spatial object, you can map that directly without specifying latitude and longitude columns.

The tigris package allows you to pull US Census TIGER shapefiles directly from R. The following call pulls a shape file with Colorado county boundaries (the cb options is so we pull a lower-resolution version):

library(tigris)
co_counties <- counties(state = 'CO', cb = TRUE)
class(co_counties)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"

Choices for mapping data to locations

This spatial object can be used when mapping data to location with the data option:

leaflet() %>% 
  addProviderTiles("Stamen.TonerBackground") %>% 
  addPolygons(data = co_counties) %>% 
  addCircleMarkers(data = co_floods, lng = ~ begin_lon, ~ begin_lat, 
                   radius = 3, color = "green") 

Customizing icons

You can use a custom icon for the map markers. For example, this tornado icon was created by Gilad Fried and is under a Creative Commons license. You can use the following code to use it to mark the Colorado tornadoes (this assumes it's been saved locally as "figures/tornado.png"):

co_tornadoes %>% 
  leaflet() %>% addProviderTiles("OpenStreetMap.Mapnik") %>%  
  addMarkers(~ begin_lon, ~ begin_lat, 
             icon = makeIcon("figures/tornado.png", iconWidth = 20, iconHeight = 20))

Using cluster markers

In cases where a leaflet map has a lot of points, it can be hard to interpret until you zoom in. In this case, it often helps to use markerClusterOptions() to create cluster markers until the map is zoomed in.

leaflet() %>% 
  addProviderTiles("OpenStreetMap.Mapnik") %>% 
  addCircleMarkers(data = co_floods, lng = ~ begin_lon, ~ begin_lat, 
                   radius = 3, color = "blue", 
                   clusterOptions = markerClusterOptions()) 

Adding pop-ups

For any of these data mappings, you can add "pop-ups" to show information when a person clicks on a marker or shape. To do this, specify either a column from the dataframe you're mapping or a vector of the same length for the popup option. For example, the following call uses the beginning date of each flood in the pop-ups:

leaflet() %>% 
  addProviderTiles("OpenStreetMap.Mapnik") %>% 
  addCircleMarkers(data = co_floods, lng = ~ begin_lon, ~ begin_lat, popup = ~ begin_date,
                   radius = 3, color = "green") 

Adding pop-ups

Shape files will often include some information in the data slot that might be useful in a pop-up. For example, the data for co_counties includes county name:

head(co_counties@data)
##     STATEFP COUNTYFP COUNTYNS       AFFGEOID GEOID        NAME LSAD
## 27       08      013 00198122 0500000US08013 08013     Boulder   06
## 28       08      029 00198130 0500000US08029 08029       Delta   06
## 29       08      059 00198145 0500000US08059 08059   Jefferson   06
## 30       08      091 00198161 0500000US08091 08091       Ouray   06
## 339      08      019 00198125 0500000US08019 08019 Clear Creek   06
## 340      08      023 00198127 0500000US08023 08023    Costilla   06
##          ALAND   AWATER
## 27  1881212055 36592000
## 28  2958007403 16886462
## 29  1979311263 25444831
## 30  1402657135  1599543
## 339 1023554877  3279667
## 340 3177806137  8828906

Adding pop-ups

You can reference values in that data slot when mapping data to locations from data stored in a spatial object:

leaflet() %>% 
  addProviderTiles("Stamen.TonerBackground") %>% 
  addPolygons(data = co_counties, popup = co_counties@data$NAME) 

Adding pop-ups

Often, it can be helpful to paste together information from several columns of the dataframe to include in the pop-up:

co_floods %>% 
  leaflet() %>% 
  addProviderTiles("OpenStreetMap.Mapnik") %>% 
  addCircleMarkers(data = co_floods, lng = ~ begin_lon, ~ begin_lat, 
                   popup = ~ paste("Date:", begin_date, "to", end_date),
                   radius = 3, color = "green") 

Adding pop-ups

If you want to get even fancier, you can include HTML tags to style the text in the pop-ups:

co_floods <- co_floods %>% 
  mutate(popup_text = paste0("<div class='leaflet-popup-scrolled' style='max-height:150px'>",
                             "<b>County:  </b>", cz_name, "<br/>", 
                             "<b>Dates:  </b>", begin_date, " to ", end_date, "<br/>", 
                             "<b># deaths:  </b>", deaths_direct, "<br/>", 
                             "<b># injuries:  </b>", injuries_direct, "<br/>", 
                             "<b>Property damage:  </b>$", damage_property, "<br/>", 
                             "<b>Crop damage:  </b>$", damage_crops, "<br/>", 
                             event_narrative))

Adding pop-ups

Here is the map using these fancier pop-ups:

co_floods %>% 
  leaflet() %>% 
  addProviderTiles("OpenStreetMap.Mapnik") %>% 
  addCircleMarkers(data = co_floods, lng = ~ begin_lon, ~ begin_lat, popup = ~ popup_text,
                   radius = 3, color = "green") 

Setting the initial view

By default, the map will initial zoom to a point that bounds all the mappings. If you want to customize where and how much the map initially zooms, you can do that with setView. For example, this call sets the initial map to show the Fort Collins area rather than all of Colorado:

co_floods %>% 
  leaflet() %>% setView(lng = -105.0844, lat = 40.5853, zoom = 9) %>% 
  addProviderTiles("OpenStreetMap.Mapnik") %>% 
  addCircleMarkers(data = co_floods, lng = ~ begin_lon, ~ begin_lat, popup = ~ popup_text,
                   radius = 3, color = "blue") 

Adding a minimap

If you set a tighter zoom like this, but also have data for a wider area, you may want to include a mini-map to help users navigate the map. You can do this with addMinimap:

co_floods %>% 
  leaflet() %>% setView(lng = -105.0844, lat = 40.5853, zoom = 9) %>% 
  addMiniMap(position = "topright") %>% 
  addProviderTiles("OpenStreetMap.Mapnik") %>% 
  addCircleMarkers(data = co_floods, lng = ~ begin_lon, ~ begin_lat, popup = ~ popup_text,
                   radius = 3, color = "blue") 

The plotly library

The plotly library is another library in the htmlWidgets family. It is more general-purpose, with functions for creating lots of different types of interactive plots.

One particular appeal is that it can be used to wrap ggplot objects, to create interactive visualizations very efficiently from code you already have to create static plots.

For example, you can create a static time series of number of flood events by date in Colorado using this code:

flood_ts <- co_floods %>% count(begin_date) %>% 
  full_join(data_frame(begin_date = seq(ymd("2013-01-01"), 
                                        ymd("2015-12-31"), by = 1))) %>% 
  mutate(n = ifelse(is.na(n), 0, n)) %>% arrange(begin_date) %>% 
  ggplot(aes(x = begin_date, y = n)) + 
  geom_line() + theme_classic() + 
  labs(x = "Date", y = "# of flood events\nin Colorado") +
  facet_wrap(~ year(begin_date), ncol = 1, scales = "free_x")

Flood timeseries

flood_ts 

Flood timeseries

Then you can use ggplotly to transform that ggplot object to an interactive graphic:

library(plotly)
ggplotly(flood_ts)

Tornado count vs. county population

You can also create plotly graphics "from scratch". This uses a similar piping method as ggplot2.

As an example, we might want to figure out:

  • Are more tornadoes are reported in counties with larger populations? and
  • Are total tornado property damages higher in counties with larger populations?

The choroplethr package includes a data set with US county populations (df_pop_county). (You could also pull this through acs or something similar, but you'd need an API key.)

data(df_pop_county, package = "choroplethr")
head(df_pop_county)
##   region  value
## 1   1001  54590
## 2   1003 183226
## 3   1005  27469
## 4   1007  22769
## 5   1009  57466
## 6   1011  10779

Tornado count vs. county population

Property damages are in a weird format:

head(co_tornadoes$damage_property, 20)
##  [1] "0.00K" "0.00K" "5.00K" "0.00K" "0.00K" "0.00K" "0.00K" "0.00K"
##  [9] "0.00K" "0.00K" "0.00K" "0.00K" "0.00K" "0.00K" "0.00K" "0.00K"
## [17] "0.00K" "0.00K" "0.00K" "0.00K"

But there's a function in noaastormevents we can use to parse those to numeric values:

head(noaastormevents::parse_damage(co_tornadoes$damage_property), 20)
##  [1]    0    0 5000    0    0    0    0    0    0    0    0    0    0    0
## [15]    0    0    0    0    0    0

Tornado count vs. county population

You can count the number of tornadoes in each county and join with this population data:

county_tornadoes <- co_tornadoes %>% 
  mutate(damage_property = noaastormevents::parse_damage(damage_property)) %>% 
  group_by(cz_name) %>% 
  summarize(fips = first(fips), 
            tornado_count = n(),
            damage_property = sum(damage_property)) %>% 
  right_join(df_pop_county %>% mutate(is_co = str_detect(region, "^8")) %>% 
               filter(is_co) %>% rename(population = value) %>% select(-is_co), 
             by = c("fips" = "region")) %>% 
  mutate(tornado_count = ifelse(is.na(tornado_count), 0, tornado_count),
         damage_property = ifelse(is.na(damage_property), 1, damage_property + 1))
head(county_tornadoes, 5)
## # A tibble: 5 × 5
##    cz_name  fips tornado_count damage_property population
##      <chr> <dbl>         <dbl>           <dbl>      <dbl>
## 1    Adams  8001            12           15001     442996
## 2  Alamosa  8003             2               1      15750
## 3 Arapahoe  8005             4               1     574357
## 4     <NA>  8007             0               1      12109
## 5     Baca  8009             6            3001       3783

Tornado count vs. county population

You can use piping to create a plotly object, map attributes of the plot to elements of the data, and then add and change elements of the object (add markers, adjust axes, etc.).

co_plot <- county_tornadoes %>% 
  plot_ly(x = ~ population, y = ~ tornado_count) %>% 
  add_markers(color = ~ log10(damage_property), 
              alpha = 0.6,
              text = ~ paste0(cz_name, " (FIPS: ", fips, ")"), 
              hoverinfo = c("x", "y", "text")) %>% 
  colorbar(title = "Log of property damage") %>% 
  layout(title = "Colorado tornadoes by county",
         xaxis = list(title = "Population", showgrid = F, type = "log"),
         yaxis = list(title = "# of tornados (2013-2015)", showgrid = F))

Tornado count vs. county population

co_plot 

Tornado count vs. county population

You can also create 3-D scatter plots with plotly:

county_tornadoes %>% 
  plot_ly(x = ~ log10(population), y = ~ tornado_count, 
          z = ~ log10(damage_property)) %>% 
  add_markers(size = I(4), text = ~ paste0(cz_name, " (FIPS: ", fips, ")"), 
              hoverinfo = c("x", "y", "text"))

Difference between Shiny and htmlWidgets

Shiny allows you to power web applications with R code run on a Server. The interactive graphics created with htmlWidgets, on the other hand, are interactive through Javascript code run on the viewer's web browser.

Image source: http://mi-linux.wlv.ac.uk/

This means that htmlWidgets graphs can be viewed without creating something linked to a Shiny server. (It also can mean that the data behind the graphic is passed to the viewers, so be careful if using sensitive data.)

Sources / Find out more

Many of the packages in the htmlWidgets family were developed at RStudio. Both the overall documentation for htmlWidgets and documentation for specific packages in the family are typically exceptional.

These sources were used in developing these slides and are also excellent references for finding out more:

More widgets